Integrating Open Sources and Relational Data with SPARQL
نویسندگان
چکیده
We believe that the possibility to use SPARQL as a front end to heterogeneous data without significant cost in performance or expressive power is key to RDF taking its rightful place as the lingua franca of data integration. To this effect, we demonstrate how RDF and SPARQL can tackle a mix of standard relational workload and data mining in public data sources. We discuss extending SPARQL for business intelligence (BI) workloads and relate experiences on running SPARQL against relational and native RDF databases. We use the well known TPC H benchmark as our reference schema and workload. We define a mapping of the TPC H schema to RDF and restate the queries as BI extended SPARQL. To this effect, we define aggregation and nested queries for SPARQL. We demonstrate that it is possible to perform the TPC H workload restated in SPARQL against an existing RDBMS without loss of performance or expressivity and without changes to the RDBMS. Finally, we demonstrate how to combine TPC-H or XBRL financial reports with RDF data from CIA factbook and DBpedia.
منابع مشابه
Integrating Heterogeneous Data Source Using Ontology
Integrating data from multiple heterogeneous sources entail dealing with different data models, schemas and query languages. The burgeoning Semantic Web has provided several new methods for data integration. This paper focuses on integration of relational database and XML data. To solve the problem we propose an ontologybased approach. A semantic integration infrastructure for heterogeneous dat...
متن کاملGeoKnow: Making the Web an Exploratory Place for Geospatial Knowledge
ologies and technologies have strengthened their position in the areas of data and knowledge management. Standards for organizing and querying semantic information, such as RDF(S) and SPARQL are adopted by large academic communities, while corporate vendors adopt semantic technologies to organize, expose, exchange and retrieve their data as Linked Data [1]. RDF stores have become robust enough ...
متن کاملOpenFlyData: An exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster
MOTIVATION Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. Howe...
متن کاملKorean Linked Data on the Web: Text to RDF
Interlinking data coming from different sources has been a long standing goal [4] aiming to increase reusability, discoverability, and as a result the usefulness of information. Nowadays, Linked Open Data (LOD) tackles this issue in the context of semantic web. However, currently most of the web data is stored in relational databases and published as unstructured text. This triggers the need of...
متن کاملDistributed query processing for federated RDF data management
The publication of freely available and machine-readable information has increased significantly in the last years. Especially the Linked Data initiative has been receiving a lot of attention. Linked Data is based on the Resource Description Framework (RDF) and anybody can simply publish their data in RDF and link it to other datasets. The structure is similar to the World Wide Web where indivi...
متن کامل